Implementation of Watch Dog Timer for Fault Tolerant Computing on Cluster Server

نویسنده

  • Rajendra M. Patrikar
چکیده

In today’s new technology era, cluster has become a necessity for the modern computing and data applications since many applications take more time (even days or months) for computation. Although after parallelization, computation speeds up, still time required for much application can be more. Thus, reliability of the cluster becomes very important issue and implementation of fault tolerant mechanism becomes essential. The difficulty in designing a fault tolerant cluster system increases with the difficulties of various failures. The most imperative obsession is that the algorithm, which avoids a simple failure in a system, must tolerate the more severe failures. In this paper, we implemented the theory of watchdog timer in a parallel environment, to take care of failures. Implementation of simple algorithm in our project helps us to take care of different types of failures; consequently, we found that the reliability of this cluster improves. Keywords—Cluster, Fault tolerant, Grid, Grid Computing System, Meta-computing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fault tolerant system with imperfect coverage, reboot and server vacation

This study is concerned with the performance modeling of a fault tolerant system consisting of operating units supported by a combination of warm and cold spares. The on-line as well as warm standby units are subject to failures and are send for the repair to a repair facility having single repairman which is prone to failure. If the failed unit is not detected, the system enters into an unsafe...

متن کامل

Towards High-performance and Fault-tolerant Distributed Java Implementations

Java Virtual Machines form an important part of the web and business server market. Distributed Java Virtual Machines have the potential to make a significant contribution to industries that utilize this technology. An attractive platform for this purpose is the cluster, a highly cost-effective and scalable parallel computer model. However, realizing on such a platform a high performance virtua...

متن کامل

Stability Assessment Metamorphic Approach (SAMA) for Effective Scheduling based on Fault Tolerance in Computational Grid

Grid Computing allows coordinated and controlled resource sharing and problem solving in multi-institutional, dynamic virtual organizations. Moreover, fault tolerance and task scheduling is an important issue for large scale computational grid because of its unreliable nature of grid resources. Commonly exploited techniques to realize fault tolerance is periodic Checkpointing that periodically ...

متن کامل

ارائه یک رویکرد همانند سازی شده عامل محور در اجرای یک الگوی کد متحرک مطمئن

Abstract Using mobile agents, it is possible to bring the code close to the resources, which is not foreseen by the traditional client/server paradigm. Compared to the client/server computing paradigm, the greater flexibility of the mobile agent paradigm comes at additional costs as well as the additional complexity of developing and managing mobile agent-based applications. Such complexity ...

متن کامل

The Fault Tolerant Computer System of The Brazilian Scientific Application Microsatellites

A fault tolerant computer system has been conceived to become the standard framework that will be utilized by the future family of Brazilian small satellites for scientific applications. Based on the proposed standard, a computer system with three processing modules was developed for the First Brazilian Scientific Application Microsatellite (SACI-1 Satélite Científico). Each processing module i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009